disturbance signal
Model-Free $\delta$-Policy Iteration Based on Damped Newton Method for Nonlinear Continuous-Time H$\infty$ Tracking Control
This paper presents a {\delta}-PI algorithm which is based on damped Newton method for the H{\infty} tracking control problem of unknown continuous-time nonlinear system. A discounted performance function and an augmented system are used to get the tracking Hamilton-Jacobi-Isaac (HJI) equation. Tracking HJI equation is a nonlinear partial differential equation, traditional reinforcement learning methods for solving the tracking HJI equation are mostly based on the Newton method, which usually only satisfies local convergence and needs a good initial guess. Based upon the damped Newton iteration operator equation, a generalized tracking Bellman equation is derived firstly. The {\delta}-PI algorithm can seek the optimal solution of the tracking HJI equation by iteratively solving the generalized tracking Bellman equation. On-policy learning and off-policy learning {\delta}-PI reinforcement learning methods are provided, respectively. Off-policy version {\delta}-PI algorithm is a model-free algorithm which can be performed without making use of a priori knowledge of the system dynamics. NN-based implementation scheme for the off-policy {\delta}-PI algorithms is shown. The suitability of the model-free {\delta}-PI algorithm is illustrated with a nonlinear system simulation.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (5 more...)
Teaching Machines to Think Like Us
Can intelligence be taught to robots? Advances in physical reservoir computing, a technology that makes sense of brain signals, could contribute to creating artificial intelligence machines that think like us. In Applied Physics Letters, from AIP Publishing, researchers from the University of Tokyo outline how a robot could be taught to navigate through a maze by electrically stimulating a culture of brain nerve cells connected to the machine. These nerve cells, or neurons, were grown from living cells and acted as the physical reservoir for the computer to construct coherent signals. The signals are regarded as homeostatic signals, telling the robot the internal environment was being maintained within a certain range and acting as a baseline as it moved freely through the maze.